Search CORE

25 research outputs found

Multi-objective evolution for Generalizable Policy Gradient Algorithms

Author: Co-Reyes John D.
Faust Aleksandra
Garau-Luis Juan Jose
Miao Yingjie
Parisi Aaron
Real Esteban
Tan Jie
Publication venue
Publication date: 08/04/2022
Field of study

Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges relevant to many practical applications in which they present themselves in combination. Still, state-of-the-art RL algorithms fall short when addressing multiple RL objectives simultaneously and current human-driven design practices might not be well-suited for multi-objective RL. In this paper we present MetaPG, an evolutionary method that discovers new RL algorithms represented as graphs, following a multi-objective search criteria in which different RL objectives are encoded in separate fitness scores. Our findings show that, when using a graph-based implementation of Soft Actor-Critic (SAC) to initialize the population, our method is able to find new algorithms that improve upon SAC's performance and generalizability by 3% and 17%, respectively, and reduce instability up to 65%. In addition, we analyze the graph structure of the best algorithms in the population and offer an interpretation of specific elements that help trading performance for generalizability and vice versa. We validate our findings in three different continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.Comment: 23 pages, 12 figures, 10 table

arXiv.org e-Print Archive

Evolving Reinforcement Learning Algorithms

Author: Co-Reyes John D.
Faust Aleksandra
Le Quoc V.
Lee Honglak
Levine Sergey
Miao Yingjie
Peng Daiyi
Real Esteban
Publication venue
Publication date: 26/03/2021
Field of study

We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.Comment: ICLR 2021 Oral. See project website at https://sites.google.com/view/evolvingr

arXiv.org e-Print Archive

Discovering Representations for Black-box Optimization

Author: Bongard Josh C
Co-Reyes John D
DaCosta Luis
Diederik
Doncieux Stephane
Elsken Thomas
Higgins Irina
Jong Kenneth De
Kim Hyunjik
Rothlauf Franz
Salimans Tim
Shahriari Bobak
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2020
Field of study

The encoding of solutions in black-box optimization is a delicate, handcrafted balance between expressiveness and domain knowledge -- between exploring a wide variety of solutions, and ensuring that those solutions are useful. Our main insight is that this process can be automated by generating a dataset of high-performing solutions with a quality diversity algorithm (here, MAP-Elites), then learning a representation with a generative model (here, a Variational Autoencoder) from that dataset. Our second insight is that this representation can be used to scale quality diversity optimization to higher dimensions -- but only if we carefully mix solutions generated with the learned representation and those generated with traditional variation operators. We demonstrate these capabilities by learning an low-dimensional encoding for the inverse kinematics of a thousand joint planar arm. The results show that learned representations make it possible to solve high-dimensional problems with orders of magnitude fewer evaluations than the standard MAP-Elites, and that, once solved, the produced encoding can be used for rapid optimization of novel, but similar, tasks. The presented techniques not only scale up quality diversity algorithms to high dimensions, but show that black-box optimization encodings can be automatically learned, rather than hand designed.Comment: Presented at GECCO 2020 -- v2 (Previous title 'Automating Representation Discovery with MAP-Elites'

arXiv.org e-Print Archive

Crossref

Small-scale proxies for large-scale Transformer training instabilities

Author: Adlam Ben
Alemi Alex
Co-Reyes John D.
Everett Katie
Gilmer Justin
Gur Izzeddin
Kornblith Simon
Kumar Abhishek
Lee Jaehoon
Liu Peter J.
Novak Roman
Pennington Jeffrey
Sohl-dickstein Jascha
Wortsman Mitchell
Xiao Lechao
Xu Kelvin
Publication venue
Publication date: 25/09/2023
Field of study

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the

\mu

Param (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms

arXiv.org e-Print Archive

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Author: Agarwal Rishabh
Anguelov Dragomir
Bronstein Eli
Chen Xiangyu
Co-Reyes John D.
Faust Aleksandra
Fu Justin
Gulino Cole
Harb Jean
Lu Yao
Lu Yiren
Luo Wenjie
McAllister Rowan
Montali Nico
Mougin Paul
Pan Xinlei
Roelofs Rebecca
Sapp Benjamin
Tucker George
Wang Yan
White Brandyn
Yang Zoey
Publication venue
Publication date: 12/10/2023
Field of study

Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents

arXiv.org e-Print Archive

Estimation of the national disease burden of influenza-associated severe acute respiratory illness in Kenya and Guatemala : a novel methodology

Author: A Coutsoudis
A Roca
Aimee Summers
AM Ropero-Alvarez
BD Gessner
Benjamin J. Cowling
C Cohen
CC Iwuji
CK Li
CO Onyango
D. James Nokes
Daniel R. Feikin
DC Burton
DJ Nokes
DR Feikin
DR Feikin
E Azziz-Baumgartner
EL Murray
FS Dawood
Gideon Emukule
H Nair
H Razuri
Henry Njuguna
I Rudan
JA Scott
James A. Fuller
JC Kwong
JC Moisi
JM Simmerman
John McCracken
Joshua A. Mott
JP Mullooly
JR Ortiz
K Adazu
KA Lindblade
Kim A. Lindblade
L Reyes
L Simonsen
LL Hammitt
M Yazdanbakhsh
Marc-Alain Widdowson
Mark A. Katz
MO Ope
MW Weber
Mwanajuma Ngama
N Homaira
Nivaldo Linares-Perez
SA Cohen
SA Madhi
Sammy Khagayi
Sidi Kazungu
Sonja J. Olsen
Wences Arvelo
WW Thompson
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Background: Knowing the national disease burden of severe influenza in low-income countries can inform policy decisions around influenza treatment and prevention. We present a novel methodology using locally generated data for estimating this burden. Methods and Findings: This method begins with calculating the hospitalized severe acute respiratory illness (SARI) incidence for children <5 years old and persons ≥5 years old from population-based surveillance in one province. This base rate of SARI is then adjusted for each province based on the prevalence of risk factors and healthcare-seeking behavior. The percentage of SARI with influenza virus detected is determined from provincial-level sentinel surveillance and applied to the adjusted provincial rates of hospitalized SARI. Healthcare-seeking data from healthcare utilization surveys is used to estimate non-hospitalized influenza-associated SARI. Rates of hospitalized and non-hospitalized influenza-associated SARI are applied to census data to calculate the national number of cases. The method was field-tested in Kenya, and validated in Guatemala, using data from August 2009–July 2011. In Kenya (2009 population 38.6 million persons), the annual number of hospitalized influenza-associated SARI cases ranged from 17,129–27,659 for children <5 years old (2.9–4.7 per 1,000 persons) and 6,882–7,836 for persons ≥5 years old (0.21–0.24 per 1,000 persons), depending on year and base rate used. In Guatemala (2011 population 14.7 million persons), the annual number of hospitalized cases of influenza-associated pneumonia ranged from 1,065–2,259 (0.5–1.0 per 1,000 persons) among children <5 years old and 779–2,252 cases (0.1–0.2 per 1,000 persons) for persons ≥5 years old, depending on year and base rate used. In both countries, the number of non-hospitalized influenza-associated cases was several-fold higher than the hospitalized cases. Conclusions: Influenza virus was associated with a substantial amount of severe disease in Kenya and Guatemala. This method can be performed in most low and lower-middle income countries

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

FigShare